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Applying  Asymptotic  Shapes  to  Nonexponential  Families 

By 

Gideon  Schwarz 

1.  Introduction:  Asymptotic  Shapes  and  Fort us1  Generalization. 

Asymptotic  shapes  were  introduced  in  an  earlier  paper  (1962).  They 
arise  as  follows.  First,  when  a  statistical  hypothesis  Hq  is  to  be 
tested  against  an  alternative  H^,  on  the  basis  of  sequentially  sampled 
independent  observations  that  cost  c  units  each,  the  optimal  procedure 
is  related  to  the  posterior  stopping  risk  R.  When  the  latter  reaches  a 
value  less  than  c,  the  optimal  (Bayes)  procedure  will  obviously  call 
for  stopping;  for  "separated”  hypotheses,  we  have  shown  also  (for  some 
cQ  and  K)  that  as  long  as  R  exceeds  K  c  log  where  c  <  cQ,  the 
optimal  procedure  leads  to  taking  another  observation.  For  these  facts, 
that  can  be  conveniently  expressed  as  inclusions  of  events 

(R  <  c)  c  (optimal  procedure  stops)  c  (R  <  K  c  log  ^0 

no  further  assumptions  are  required. 

For  the  second  step,  the  distribution  of  the  observations  were  assumed 
to  form  a  (k-dimensional )  exponential  family.  For  this  case,  the  three 
events  forming  the  chain  of  inclusions  above  can  be  interpreted  as  sets  in 
the  (k+1) -dimensional  space  of  S(X^)  +•••  +  S(JCn),  the  (k "dimensional ) 
sufficient  statistic  of  the  first  n  observations,  with  n  itself  forming 
the  k  +  first  coordinate.  It  was  then  shown  that,  as  c  -*  0,  the  two  sets 
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at  the  ends  of  the  chain  grow  at  the  rate  of  log  ^  ,  and  if  this  growth 
is  counteracted  by  shrinking  them  at  that  rate,  both  tend  to  the  same 
limit-set,  and  hence,  the  same  holds  for  the  optimal  stopping  set  sand¬ 
wiched  between  them,  if  it  too  is  rescaled  by  shrinking  it  log  i  -  fold. 
The  limit-set  is  the  asymptotic  shape,  and  blowing  it  up  back  to  log  i- 
times  its  size  yields  an  approximation  to  the  optimal  stopping  set.  In 
terms  of  the  generalized  likelihood  ratio  statistic  for  testing 

against  its  complement,  the  approximate  stopping  region  is  the  set 

where  at  least  one  of  and  A,  exceeds  i  . 

u  l  c 

Recently  Fortus  (1979)  attempted  to  do  away  with  the  restriction 
to  exponential  families.  As  those  are  characterized  by  the  existence 
of  a  vector  valued  statistic  that  is  sufficient  when  summed  over  the 
observations,  Fortus  chose  a  function  valued  statistic  to  play  a  similar 
role:  the  log  likelihood  function.  In  the  linear  (00-dimensional )  space 
of  these  functions,  with  one  dimension  added  for  n,  stopping  regions 
and  regions  of  constant  posterior  risk  are  well-defined.  The  concept  of 
shrinking  (by  log  £■)  is  meaningful  here  as  well,  and  so  asymptotic 
shapes  are  obtained,  and  the  approximate  procedure  that  results  from 
replacing  the  actual  shape  by  the  asymptotic  shape  is  defined  by  Fortus 
just  as  in  the  exponential -family  case,  and  can  be  expressed  in  terms  of 
A0  and  A^  here  as  well. 

An  important  improvement  added  by  Fortus  to  his  generalization,  is 
the  proof  of  local  uniformity  of  the  convergence  of  the  scaled  region  to 
its  asymptotic  shape. 
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2.  Interpreting  the  Convergence  and  its  Uniformity. 

In  the  exponential  case,  when  the  stopping  risk  R  is  regarded  as 
a  function  of  E  =  S(X1)  +  •••  +  S(XQ)  and  n,  the  domain  of  this  function 
consists  properly  of  those  pairs  (2,n)  which  are  attained  by  some  possible 
sequence  X^,.«.,XQ.  It  is  convenient  to  extend  R  to  all  pairs  where 
its  formula  is  meaningful.  We  (1962)  mentioned  one  part  of  this  extension 
(the  inclusion  of  noninteger  n  values )  but  failed  to  mention  that  in 
some  cases,  such  as  for  integer-valued  S,  we  assumed  R  to  be  defined 
as  if  also  S  were  real  valued.  Fortus  (1979)  proceeds  likewise. 

Only  with  the  domain  thus  extended  is  the  geometric  description  of 
the  various  regions  and  shapes  valid,  and  this  must  be  kept  in  mind  when 
one  attempts  to  evaluate  one  characteristic  feature  of  the  asymptotic 
shapes  method:  as  n  tends  to  infinity,  the  mean  sufficient  statistic 
Tn  (equal  s/n  in  the  exponential  case  and  the  log  likelihood  divided 
by  n  in  Fortus'  case)  is  held  fixed. 

For  the  convergence  of  the  scaled  regions  to  the  asymptotic  shape, 
the  fixing  of  Tn  is  merely  a  technical  device,  made  appropriate  by 
the  fact  that  the  regions  grow  in  all  directions  at  the  same  asymptotic 
rate  of  log  ^  when  c  tends  to  zero.  However,  when  the  asymptotic 
shape  is  to  be  used  in  a  real  problem,  where  c  is  small  but  positive, 

Tn  will  never  be  fixed  as  n  increases,  and  the  justification  of  using 
the  approximate  procedure  depends  on  two  further  results.  One  is  the 
local  uniformity  in  Tr  of  the  convergence.  This  result  is  new  in 
Fortus  (1979)  even  for  the  exponential  case.  But,  to  utilize  the  local 
uniformity  in  for  an  evaluation  of  the  asymptotic  procedure,  Tn 
must  be  shown  to  remain  in  a  set  for  which  the  uniformity  is  valid,  as 
sampling  proceeds. 
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For  Interior  parameter  points  of  an  exponential  family,  the  required 
behavior  of  TR  is  guaranteed  by  the  law  of  large  numbers:  there  Tq 
is  the  mean  of  n  independent  identically  distributed  vectors  S(X^), 
with  finite  moments  of  all  orders;  consequently  it  converges  almost  surely 
to  its  expectation.  In  Fortus*  case,  Tr  is  function  valued,  and  even 
pointwise,  TQ(e)  may  not  converge.  For  E(Tn(e))  is  the  Kullback-Leibler 
distance  between  the  uniform  distribution  and  the  distribution  under  e, 
and  this  distance  may  be  infinite.  In  such  a  case  Tn  will  not  stay  in 
a  bounded  set,  and  the  uniformity  will  not  apply. 

We  therefore  add  one  condition  to  Fortus*  assumptions:  For  X^,...,XQ, 
independent,  distributed  according  to  one  of  the  distributions  in  the 
parameter  space,  the  mean  log  likelihood  . .  ,XQ)  stays  almost 

surely  in  a  set  (of  functions)  that  is  bounded  (in  the  metric 
sup|exp  f  -  exp  g| );  or  equivalently,  the  set 

(exp(Tn(e|xi,...,Xn))  -  exp(Tm(e|X1,...,Xta))) 
is  almost  surely  bounded. 

Whenever  this  condition  holds,  Fortus*  description  of  the  approximation 
as  "reasonable...  for  small  c"  can  be  justified  by  applying  his  uniformity 
result.  For  the  practical  application,  there  is  still  the  question  how 
small  is  "small". 

3.  The  Second  Order  Correction  to  the  Size. 

At  the  end  of  his  paper,  Fortus  quotes  Fushimi  (1967 ),  >ho  found 

Q 

in  numerical  examples  that  for  c  =  10  the  approximation  is  still 


fax  from  reasonable.  Hie  limitation  imposed  thereby  on  the  application 

is  seen  to  be  less  severe  if  one  considers  that  c  is  the  cost  of  an 

observation  in  units  of  the  penalty  for  a  wrong  decision,  and  that 
-8 

c  =  10  ,  e.g. ,  corresponds  to  sample  sizes  of  the  order  of  magnitude 

D 

of  log  10°,  which  is  less  than  20. 

Fushimi  proceeds  to  find  a  second  order  correction  for  the  one¬ 
dimensional  normal  case,  with  linear  loss;  subsequently  we  generalized 
it  to  other  one -dimensional  exponential  families  and  other  loss  functions 
(1969),  and  also  to  some  higher-dimensional  exponential  families  (1971). 

In  one  sense  these  results  are  incomplete:  the  second  order  corrections 
for  the  two  regions  that  flank  the  optimal  region  in  the  chain  of 
inclusions  in  Section  1  differ  from  each  other  asymptotically  by 
log  log  £■,  and  therefore  the  optimal  region  cannot  be  approximated 
by  .this  method  any  closer  than  log  log  ^  .  This  is  also  the  order  of 
magnitude  of  the  correction  term,  so  not  much  seems  to  be  gained  by 
including  it.  Still,  using  it  one  can  approximate  the  optimal  stopping 
region  with  an  error  term  equal  to  ^  log  log  +  0(1),  while  without 
it,  the  error  contains  higher  multiples  of  the  log  log  term,  i.e.  at 
least  log  log  ~  in  the  case  treated  by  Fushimi.  Since  the  regions 
are  in  (£,n)-space,  the  error  mentioned  above  corresponds  to  an  error 
proportional  to  log  log  in  sample  size,  or  to  c  log  log  in  cost.  In 
either  description,  the  relative  error  is  asymptotically  (“log  log^-)/log 
For  the  one-dimensional  case  with  losses  proportional  to  the  squared 
distance  from  the  indifference  region,  the  relative  error  would  be 
five  times  as  large.  If  the  second  order  correction  were  ignored  (see 
f igure) . 
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For  applications,  the  second  order  correction  is  clearly  of  crucial 
importance.  Since  it  varies  with  the  dimension  of  the  exponential  family, 
no  one  form  will  do  for  the  general  case.  In  fact,  since  it  grows  propor¬ 
tionally  with  the  dimension,  it  appears  most  necessary,  yet  least  accessible 
when  the  dimension  becomes  infinite,  as  it  may  be  under  Fort us'  assumptions. 
It  can  be  salvaged,  however,  if  we  retain  an  assumption  of  finite-dimen¬ 
sionality  less  stringent  than  that  of  an  exponential  family.  In  the  latter, 
the  dimension  of  the  parameter  space  is  also  the  linear  dimension  of  the 
log  densities.  The  second  order  correction  terms  generalizes  under  some 
regularity  assumptions  to  the  case  where  the  parameter  space  is  Euclidean 
k-space,  as  we  now  proceed  to  exemplify  by  the  case  k  =  1. 

So,  we  let  6  be  real  valued,  and  strengthen  Fort us'  continuity 
assumption  by  requiring  the  likelihood  function  to  be  unimodal  and  to 
possess  bounded  second  derivatives,  a  condition  that  holds  automatically 
in  the  exponential  case.  Also,  we  assume  the  hypotheses  to  be  half-lines 
separated  by  a  finite  interval  (eQ,0^),  and  the  loss  function  to  be 
bounded,  and  to  behave  like  |e-e^|P  just  outside  the  interval.  Finally, 
we  assume  an  a  priori  density,  bounded  between  positive  numbers  in  every 
finite  interval. 

Under  these  assumptions,  the  evaluation  of  the  second  order  correction 
in  Schwarz  (1969)  goes  through,  and  yields  for  the  size  factor  by  vhich  to 
blow  up  the  asymptotic  shape 

log  i  -  (p  +  1  +  |)log  log  i  . 
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Thus  corrected,  FortusA  generalization  yields  an  approximation  appli¬ 
cable  in  the  case  of  a  parametric  family.  For  nonpar ametric  problems, 
though  formally  correct,  the  approximation  cannot  be  corrected,  and  without 
a  correction  it  remains  too rough  to  be  of  any  practical  value. 

For  exponential  families,  the  gap  between  the  constant-risk  bounds 

of  the  Bayes  regions  has  been  eliminated  by  Lorden  (1967,  1977,  1980) 

* 

who  showed  that  for  appropriate  M  ,  the  Bayes  procedure  does  not  stop 

* 

as  long  as  R  exceeds  M  c.  This  determines  the  correct  sign  preceding 
the  in  the  last  formula  to  be  a  minus,  and  reduces  the  relative 
error  to  O((log  c-1)”1).  Hopefully  this  result,  that  is  best  possible 
if  full  dependence  on  the  prior  is  avoided,  can  also  be  extended 
beyond  exponential  families. 
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